Randomized Multi-pass Streaming Skyline Algorithms

نویسندگان

  • Atish Das Sarma
  • Ashwin Lall
  • Danupon Nanongkai
  • Jun Xu
چکیده

We consider external algorithms for skyline computation without pre-processing. Our goal is to develop an algorithm with a good worst case guarantee while performing well on average. Due to the nature of disks, it is desirable that such algorithms access the input as a stream (even if in multiple passes). Using the tools of randomness, proved to be useful in many applications, we present an efficient multi-pass streaming algorithm, RAND, for skyline computation. As far as we are aware, RAND is the first randomized skyline algorithm in the literature. RAND is near-optimal for the streaming model, which we prove via a simple lower bound. Additionally, our algorithm is distributable and can handle partially ordered domains on each attribute. Finally, we demonstrate the robustness of RAND via extensive experiments on both real and synthetic datasets. RAND is comparable to the existing algorithms in average case and additionally tolerant to simple modifications of the data, while other algorithms degrade considerably with such variation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lot Streaming in No-wait Multi Product Flowshop Considering Sequence Dependent Setup Times and Position Based Learning Factors

This paper considers a no-wait multi product flowshop scheduling problem with sequence dependent setup times. Lot streaming divide the lots of products into portions called sublots in order to reduce the lead times and work-in-process, and increase the machine utilization rates. The objective is to minimize the makespan. To clarify the system, mathematical model of the problem is presented. Sin...

متن کامل

Energy-efficient skyline query optimization in wireless sensor networks

With the deployment of wireless sensor networks (WSNs) for environmental monitoring and event surveillance, WSNs can be treated as virtual databases to respond to user queries. It thus becomes more urgent that such databases are able to support complicated queries like skyline queries. Skyline query which is one of popular queries for multi-criteria decision making has received much attention i...

متن کامل

Streaming algorithms for language recognition problems

We study the complexity of the following problems in the streaming model. Membership testing for DLIN. We show that every language in DLIN can be recognised by a randomized one-passO(log n) space algorithm with inverse polynomial one-sided error, and by a deterministic p-pass O(n/p) space algorithm. We show that these algorithms are optimal. Membership testing for LL(k). For languages generated...

متن کامل

Maximum Matching in Semi-streaming with Few Passes

In the semi-streaming model, an algorithm receives a stream of edges of a graph in arbitrary order and uses a memory of size O(npolylogn), where n is the number of vertices of a graph. In this work, we present semi-streaming algorithms that perform one or two passes over the input stream for Maximum Matching with no restrictions on the input graph, and for the important special case of bipartit...

متن کامل

SkySuite: A Framework of Skyline-Join Operators for Static and Stream Environments

Efficient processing of skyline queries has been an area of growing interest over both static and stream environments. Most existing static and streaming techniques assume that the skyline query is applied to a single data source. Unfortunately, this is not true in many applications in which, due to the complexity of the schema, the skyline query may involve attributes belonging to multiple dat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2009